Trained Ternary Quantization

نویسندگان

Chenzhuo Zhu

Song Han

Huizi Mao

William J. Dally

چکیده

Deep neural networks are widely used in machine learning applications. However, the deployment of large neural networks models can be difficult to deploy on mobile devices with limited power budgets. To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values. This method has very little accuracy degradation and can even improve the accuracy of some models (32, 44, 56-layer ResNet) on CIFAR-10 and AlexNet on ImageNet. And our AlexNet model is trained from scratch, which means it’s as easy as to train normal full precision model. We highlight our trained quantization method that can learn both ternary values and ternary assignment. During inference, only ternary values (2-bit weights) and scaling factors are needed, therefore our models are nearly 16× smaller than fullprecision models. Our ternary models can also be viewed as sparse binary weight networks, which can potentially be accelerated with custom circuit. Experiments on CIFAR-10 show that the ternary models obtained by trained quantization method outperform full-precision models of ResNet-32,44,56 by 0.04%, 0.16%, 0.36%, respectively. On ImageNet, our model outperforms full-precision AlexNet model by 0.3% of Top-1 accuracy and outperforms previous ternary models by 3%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

We propose a cluster-based quantization method to convert pre-trained full precision weights into ternary weights with minimal impact on the accuracy. In addition we also constrain the activations to 8-bits thus enabling sub 8-bit full integer inference pipeline. Our method uses smaller clusters of N filters with a common scaling factor to minimize the quantization loss, while also maximizing t...

متن کامل

Ternary Neural Networks with Fine-Grained Quantization

We propose a novel fine-grained quantization method for ternarizing pre-trained full precision models, while also constraining activations to 8-bits. Using this method, we demonstrate minimal loss in classification accuracy on state-of-the-art topologies without additional training. This enables a full 8-bit inference pipeline, with best reported accuracy using ternary weights on ImageNet datas...

متن کامل

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

This paper presents incremental network quantization (INQ), a novel method, targeting to efficiently convert any pre-trained full-precision convolutional neural network (CNN) model into a low-precision version whose weights are constrained to be either powers of two or zero. Unlike existing methods which are struggled in noticeable accuracy loss, our INQ has the potential to resolve this issue,...

متن کامل

Performance Estimation for Lowpass Ternary Filters

Ternary filters have tap values limited to −1, 0, or +1. This restriction in tap values greatly simplifies the multipliers required by the filter, making ternary filters very well suited to hardware implementations. Because they incorporate coarse quantisation, their performance is typically limited by tap quantisation error. This paper derives formulae for estimating the achievable performance...

متن کامل

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

A low precision deep neural network training technique for producing sparse, ternary neural networks is presented. The technique incorporates hardware implementation costs during training to achieve significant model compression for inference. Training involves three stages: network training using L2 regularization and a quantization threshold regularizer, quantization pruning, and finally retr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1612.01064 شماره

صفحات -

تاریخ انتشار 2016

Trained Ternary Quantization

نویسندگان

چکیده

منابع مشابه

Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

Ternary Neural Networks with Fine-Grained Quantization

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Performance Estimation for Lowpass Ternary Filters

Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks

عنوان ژورنال:

اشتراک گذاری